UDC Consortium, PO Box 90407, 2509 LK The Hague, The Netherlands
Tel.: (+31) 70 314 0509    Fax: (+31) 70 314 0667    E-mail: udc@kb.nl

The UDC MRF Database Development and Design

by P. D. Strachan and F. M. H. Oomes, UDC Consortium, The Hague
(Written 1993, chapter 'Content of the MRF" updated Nov 2001)

Introduction | Compilation process | Database structure | MRF Content | Further development

 

Introduction

In March 1990 a Task Force on UDC System Development that in October 1988 had been established by the UDC Management Board, submitted its final report. Its scope had been defined as follows:

... to advise the UDC Management Board ... concerning appropriate long-term, strategic development of the Universal Decimal Classification as in its entirety an effective, flexible and durable system for use in classifying recorded information and knowledge.

In view of this somewhat global and anyhow flexible definition, it may surprise - and demonstrate the pragmatic approach of the Task Force - that it said in its first and primary recommendation:

A "standard version" of c. 60.000 subdivisions, in English, in machine readable format should be created. It should be supported by a semantic network and have a much more consistently faceted structure than at present.

This database should be completed within two years and provide the individual publishers of UDC versions the material for the compilation of their editions. It should also be the basis for revision of the schedules and the starting point for "Extensions and Corrections to the UDC". To achieve this a consortium of interested institutions should be set up. The Management Board accepted the recommendations of the Task Force and per 1 January 1992 FID transferred the intellectual ownership and the responsibility for maintenance and development of the UDC to the UDC Consortium.


Sources

At that time the creation of the machine readable version — in the meantime baptized Master Reference File (MRF) — of the schedules had already started. For practical reasons the International Medium Edition, published by BSI Standards was selected as basis for this database. The text was already available in digitalized form and, taking into account the necessary updating, its size corresponded fairly well with the 60.000 notations recommended by the Task Force. This basis has been modified and supplemented by:

  1. All revisions authorized in Extensions and Corrections to the UDC, Series 10: 1 (1978) up to 14:3 (1992).
  2. Entries selected from a number of editions of around medium size that had been published after the International Medium Edition. This included the Japanese Medium Edition (1984) (especially science and technology), the Hungarian Large Abridged Edition (1991), the Serbo-Croatian Medium Edition (1991) and the French Medium Edition (first volume 1990).
  3. Additions to fill gaps in hierarchies and arrays that resulted from the selective nature of medium editions, but did not match with the required consistency of the MRF.

 

Compilation process

The database was compiled using UNESCO's Micro CDS/ISIS version 3.0. The development of the database design, the formats ("worksheets") for printing, editing and display was done by Gerhard Riesthuis, senior lecturer at the University of Amsterdam and David Strachan. Drs. Riesthuis also wrote the different programs for conversion of the various sources to CDS/ISIS files.

The design had to take account of a highly complicated process, caused by the fact that the database had to be compiled from various sources. Firstly the conversion to CDS/ISIS of the files of the International Medium Edition; secondly the materials from Extensions and Corrections before Series 13 that had to be keyed in and converted; then the conversion of the already existing text files of Extensions and Corrections 13/14 up to EC14:3 that was published in October 1992.

Besides there existed separate lists of cancellations and modifications, including replacements of cancellations, to IME and finally of the selections made from the more recent medium editions.

For practical reasons the entire UDC was divided in ca. 30 subject sections, each of which was completed and edited separately. For almost each of those sections separate databases had to be built for the material from each of the applicable sources. The diagram below is a very simplified representation of the compilation process and one should realize that this had to be done for each of the ca. 30 sections mentioned above.

The last stage of final editing included, among other things, checking of references, the translation of entries of which only a German (or in some cases French) text was available and expansion of the IME in line with more recent medium editions.

 

Database structure

The field structure of the CDS/ISIS database had to account for the various components of an entry in UDC-schedules and for the individual sources of the database content as well as for the different type of intervention during the process of editing. While compiling, some of the original fields turned out to be superfluous or not practical, so they were never used. Some fields were only declared because they allowed for selections supporting the editorial operation or to produce printed output in a certain format. The design has known several versions of which the final one is almost complete listed below. Some fields were divided in subfields that could be individually accessed by the CDS/ISIS software. To fill in the fields in many cases a table of codes had been defined.

Field Function and explanation
10 Validation (codes).
This field was used for selecting and deleting cancelled entries.
20 Source of original entry (codes).
i.e. International Medium Edition, English Full Edition, E&C etc.
21 Changes made to source entry (codes).
i.e. translated into English, updated from later revision, etc.
24 Type of special auxiliary (if applicable) (codes).
i.e. hyphen, point-nought, apostrophe, other (e.g. ...0/ ... 9).
25 Derived from parallel-subdivision (Y/N).
26 UDC-source of parallel instruction for 25.
31 Stage code for database creation (codes).
33 Changed this stage (Y/N).
Used for special selections for editing.
40 Table (codes).
To allow for output in the correct sequence each of the Tables of the Common Auxiliaries and the Main Tables as a whole had to be individually coded.
45 Application of special auxiliary: note (coded).
If an entry was accompanied by a note concerning the application of special auxiliaries, this was indicated by its code (see field 24).
46 Application of special auxiliary: parallel subdivision (code).
Analogue to 45 if the auxiliary was introduced by the parallel division.
50 Language (coded).
Indicated availability of English and/or German text.
55 Added by other medium editions? (codes).
Codes for the more recent medium editions mentioned above.
100 UDC-number.
109 Index only.
Used for indexing UDC-numbers that were not covered by the selection tables defined for other fields.
110 UDC-description: definition.
112 UDC-description: verbal examples.
To separate examples in the description from the core concept and often containing entries from hierarchically lower levels in full editions.
120 References.
With subfields for notation and accompanying text.
130 Notes explaining application or scope of the entry.
140 Combination examples.
With subfields for the notation of the example, its description, annotation and references.
160 Parallel division note.
For UDC-numbers that are parallel divided as a certain other UDC-number (source).
With subfields for the notation and accompanying text.
162 Examples of the parallel division of field 160 with subfields for notation and description.
210--262 Same function as 110--162 but with, if available, German text.
900-- Fields used for different type of editorial annotations (special characters etc.).

UNESCO's CDS/ISIS software proved to be a very reliable, although in many cases somewhat tough and somewhat unfriendly tool for compiling, editing and managing the databases. Its main advantage appeared to be its flexibility in output and display of the database content, and in converting database from one format to another.

However it would be very useful if it would offer facilities for automated checking of references, which can now only be done by hand via printed lists.

A minor but awkward problem is that Micro CDS/ISIS uses the apostrophe for delimitation of the search argument. Apostrophe-auxiliaries therefore disturb the search facility. For the time being this problem has been circumvented by replacing the apostrophe by an inverted comma; for printed output this has to be corrected by a search-and-replace action of the text processing software.

Merging

In the last stage of the creation of the Master Reference File the separate databases for the various sections had to be merged into one database file.

Before doing this a new database design had to be developed. Some fields in the former design were no longer functional, others had to be added so as to register revisions and revision history. Of course, a copy of the original database files has been kept for future reference and to keep track of the sources.

The new design, which is so far more or less experimental - as said, converting a database to a new format is relatively easy in Micro CDS/ISIS - has the following field structure:

Descriptive fields

Field Function and explanation
1 UDC-number.
2 Table (codes).
To allow for output in the correct sequence each of the Tables of the Common Auxiliaries and the Main Tables as a whole had to be individually coded.
3 Type of special auxiliary (if applicable) (codes).
i.e. hyphen, point-nought, apostrophe, other (e.g. ...0/ ... 9).
4 Combination type (codes).
For composed notations this field should indicate the type of combination i.e. with colon or with a certain type of special auxiliary.
5 If applicable UDC number from which the number had been derived by parallel division.
11 If applicable UDC number that is the source for parallel division of the number.
With subfields for the notation and accompanying text.
12 Type of special auxiliary (coded) introduced by the parallel division.
13 Type of special auxiliary introduced by an application note (see field 111).
100 Description: definition.
With subfields for language versions.
105 Description: verbal examples.
With subfields for language versions.
110 Scope note.
To explain the semantic content of the description.
With subfields for language versions.
111 Application note.
For technical details about the application (e.g. applicable special auxiliaries).
115 Combination examples.
With subfields for the notation of the example, and language versions of its description, annotation and references.
120 Examples of the parallel division of field 11 with subfields for notation and language versions of its description.
125 References.
With subfields for notation and language versions of accompanying text.


Administrative fields

Field Function and explanation
901 Date of introduction.
903 Source of introduction.
904 Comments on introduction.
911 Date of cancellation.
912 Replacement(s).
913 Source for cancellation.
914 Comments on cancellation.
921 Date last revision.
922 Specification of revision indicated by number(s) of revised field(s).
923 Source for revision.
924 Comments on revision.
925 Revision history indicated by date and number of revised field.
951

Index only. Used for indexing UDC-numbers that are not covered by the selection tables defined for other fields.

952 Note concerning the use of special characters. In CDS/ISIS many diacritics and special signs have to be coded and this cannot be done in the description itself because this may disturb the searching facilities.
Coding therefore has to be done in a special separate field.
955 Editorial annotations and comments.

Content of the Master Reference File

The database in its last update in the year 2004 contains totals cca 67,000 records = UDC class numbers. The datafile (.MST) occupies ca. 13,000 Kb, the index file (on UDC-numbers and words), 5.900 Kb. The distribution of the records according to section and subject fields is as follows (updated in May 2005):

  SUBJECT COVERAGE  
Table Description (shortened) UDC numbers
Ia/k Common auxiliaries
including
13,378
Ic Language 1365
Id Form 360
Ie Place 9054
If Ethnic 38
Ig Time 284
Ik Properties 1553
Ik Materials 151
Ik Processes 333
Ik Persons 240
0 Generalities ... Documentation. Librarianship etc. 1800
1 Philosophy. Psychology

824

2 Religion 2215
3 Social Sciences
including
6944
30/32 General. Statistics. Sociology. Demography. Politics 962
33 Economics 2128
34 Law

1826

35 Public Administration. Government 1010
36 Public Welfare 581
37 Education 234
39 Folklore. Etnography 194
5 Mathematics. Natural Sciences
including
11176
51 Mathematics 1033
52 Astronomy 625
53 Physics 1846
54 Chemistry. Mineralogical Sciences 3305
55 Earth Sciences 1497
56/59 Palaeontology. Biological Sciences 2820
6 Applied Sciences. Medicine. Technology
including
27659
61 Medical Sciences 3170
62/621.2 Technology in general. Heat Engines. Hydraulics 1576
621.3 Electrical Engineering 1698
621.4/.6 Heat Engines. Pneumatic Energy. Fluids Handling 477
621.7/.9 Mechanical Technology 1486
622 Mining 679
623 Military Engineering 618
624/627 Civil Engineering 1522
628 Public Health Engineering 497
629 Transport Vehicle Engineering 1779
63 Agricultural Sciences 2273
64 Home Economics 718
65 Management and Organisation of Industry 1387
66 Chemical Technology 4455
67/68 Various Industries and Crafts 4576
69 Building 705
7 Arts. Recreation. Entertainment. Sport 2596
8 Language. Linguistics. Literature 616
9 Geography. Biography. History 246

Further developments

As a database the MRF will certainly be useful in automated systems for cataloguing and information retrieval. Therefore, the UDC Consortium decided to make it available as such to interested libraries and documentation institutes.

In this stage of its development the MRF can be delivered as a database in Micro CDS/ISIS, as a file in ISO 2709 interchange format and as a text file in plain ASCII that can be loaded in a text processor. Other ways of distribution including special user applications for accessing the MRF will be developed if they respond explicitly to the users' needs and the necessary funding is available.

With regard to this the UDC Consortium will be very grateful for suggestions from users.

The MRF database will be the core material for all editions of the UDC in whatever language, size and form, and on whatever medium. It is also the starting point for all future revisions and enhancements of the UDC. It is the intention of the UDC Consortium to approach the revision process in a more structural way and to shorten the revision procedures.

The needs and wishes of the users of the UDC will remain the most important source for revision, for the UDC should be their tool and not a purpose in itself. User clubs might be the vehicle for their comments and suggestions.

However, users should realize that maintenance and enhancement of the UDC requires not only the involvement and enthusiasm of users, but also money needed for committing revision work, staffing and equipment.

It would be disappointing for all those involved in the creation of the MRF if this project could not be continued and further developed, if this first step were not followed by many others.

 

       


About UDC   |  Outline   |   Master Reference File | UDC news |
UDC publications | Bibliography | Consortium members | UDC users